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ABSTRACT 



Disclosed is a cache coherency controller used in a multi- 
processor system. The cache coherency controller reflects a 
cache line including data produced by a preceding thread to 
a cache line including data produced by a succeeding thread. 
On the other hand, the cache coherency controller prevents 
a cache line including data produced by the succeeding 
thread from being reflected to the cache line including data 
produced by the preceding thread. The cache coherency 
controller maintains a sequential order (relationship) among 
threads based on a thread sequence information table and 
thereby maintains data anti-dependence. 

8 Claims, 22 Drawing Sheets 
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CACHE COHERENCY CONTROLLER OF raflc cts a cache line in cl i if li n p Hat a prn rt ucedbv a precedin g 

CACHE MEMORY FOR MAINTAINING thread to a cache line including data produced by a sue - 

DATA ANTI-DEPENDENCE WHEN THREADS c eeding thread and preyeaU from reflecting a cac he line 

ARE EXECUTED IN PARALLEL i ncludiog data produced by said succeeding threaa to said 

S ca che line including data produced by said preceding th read. 

BACKGROUND OF THE INVENTION with the unique and unobvious structure of the present 

The present invention relates to a cache coherency con- invention, data anti-dependences are assured in a cache 

troUer of a cache memory, and more particularly to a cache memory. 

coherency controller of a cache memory for maintaining BRIEF DESCRIPTION OF THE DRAWINGS 
data anti-dependence when a plurality of threads having 

sequential orders arc executed in parallel. above-mentioned and other objects, feaUires and 

T J , 1 . . , u- *i_ J advantages of this invention will become more apparent by 

In order to exploit parallelism m a problem, a multi-thread - . .l r n • j * -i j j • c.l • 

.V J r J- J 1 1 reference to the following detailed description of the mven- 

execution method for dividing a single sequential program , , • j 

, c • i *• * / c A tion taken m conjxmction with the accompany mg drawings 

into a plurality of instruction streams (reterred to as 55 which* r .7 & & 

"threads", hereinafter) and executing these threads in par- 
allel has been proposed. F^*^" ^ ^ ^^'^^ execution model view illustrating a 

T *u- w .u J A A principle of the present invention; 

In this multi-thread execution method, threads are gener- .^r^ * . 

ated by a fork operation. A (parent) thread which performs , ^ block diagram showing the configuraUon of a 

a fork operation is caUed a "preceding thread" and a newly 20 embodiment of the present invention; 

generated (child) thread is called a "succeeding thread'\ P^G. 3 is a block diagram showing m detail the configu- 

Threads are eliminated after performing a prescribed opera- ration of a cache coherency controller of the first embodi- 

tion in a multi-thread program. In other words, the genera- ^^^^ of invention; 

tion and the elimination of threads are repeated. Each thread FIG. 4 is a block diagram showing a line configuration of 
is allocated to a processor . In a system physically bavm g a 25 ^ cache memory of the first embodiment of the invention; 

plurality of processors, a plurality of threads are simu lta- FIG. 5 is a flowchart illustrating a processing flow when 

p fously executed. Hy ailocaimg a plurality ot threads to ea ch a reading miss occurs in the cache coherency controller of 

p rocessor, delaying can be concealed by starting anothe r the first embodiment of the invention; 

thread when qqc thread is pto.dJn-aJlNy.aitinglst^ FIG. 6 is a flowchart Hlustrating a processing flow during 
cansf^d by a synr hronizin g miss , resource, contentionj or a 30 a writing operation performed by the cache coherency 

cach e _miss ) fi nd arrnrdingly the, urilir atiQ n . efficiency of controller of the first embodiment of the invention; 

reso urces can be increased. PIG ^ -^^ y^^^^y^ diagram showing the configuration of a 

If a sequential program is divided into a plurality of second embodiment of the present invention; 

threads which have sequential execution order, there is a pjo 5^5^ block diagram showing a directory table/main 
possibility that a preceding thread may read an erroneous 35 nj^mory of the second embodiment of the invention; 

value when a succeeding thread writes a fiiture value for the ^ ^ ^ ^^^^^^^ illustrating a processing flow when 

same address before the preceding thread reads data. Such a ^ ^^^^^ ^^^^ ^ ^^^^ coherency controller of the 

rela lor^hip is caUed data anU-dependence . In order to ^^^^^^ embodiment of the invention; 

deal with such data anti-dependence, conventionally, the „ . . „ , . 
reading of an erroneous value has been prevented by storing 40 FIG. 10 is a flowchart illustrating a processing flow dunng 

information regarding aU load or store operations before- ^ "P^^^^^"" ^f'^Tt \ t ^ ^^erency 

hand and performing controlling so as to prevent the stored ^^"^ embodiment of the invention; 

data of a succeeding thread from being used for the loading ^IG. 11 is a block diagram showing the configuration of 

of a preceding thread. ^ ^^^^ embodiment of the present invention; 

However, in the prior art, since it is necessary to store the ^1^. 12 is a block diagram showing the configuration of 

write address of the succeeding thread beforehand and make ^ ^^^^^ ^^^^y of the third embodiment of the invention; 

a comparison between the succeeding thread and the pre- FIG. 13 is a block diagram showing in detail the configu- 

ceding thread, exclusively used and complex hardware mxist ration of a cache of the third embodiment of the invention; 

be prepared, FIG. 14 is a flowchart illustrating a processing flow during 

In addition, since the numbers of addresses and data to be a reading operation perf"orraed by a protocol sequencer of 

stored differ depending on the characteristics of executed third embodiment of the invention; 

problenis, hardware is useless in a problem in which the FIG. 15 is a flowchart Ulustrating a processing flow during 
number of times of accessing to a main memory is small. }^ a writing operation performed by the protocol sequencer of 

a jiynhlem in ^y hich t he number of tirnes of accessing to th e the third embodiment of the invention; 

m ain memorv is large, the num ber of entries for registerin g FIG, 16 is a view showing a status example of a cache 

a ddresses/data beco m es short and consequendy para llel entry of the third embodiment of the invention; 

ex ecution is lim ited. ' " FIG. 17 is a view showing another status example of the 



SUMMARY OF THE INVENTION 



cache entry of the third embodiment of the invention; 

go FIG. 18 is a block diagram showing a line configuration 

In view of the foregoing problem of the conventional of a cache memory of a fourth embodiment of the present 

system, an object of the present invention is to eliminate a invention; 

d ata anti-dependence by a cache memory when a p luralit y of FIG. 19 is a view showing an operation example of the 

U ireads having sequential orders are to be simultaneo usly cache memory of the fourth embodiment of the invention; 

ex eunt pd m the samfi mfimnry sp ace. 55 V\Q, 20 is a view showing another operation example of 

In a cache coherency controller according to a first aspect the cache memory of the fourth embodiment of the inven- 

of the present invention, a cache coherency controller tion; 
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FIG. 21 is a view showing yet another operation example on a request from the corresponding processor 2 and a signal 

of the cache memory of the fourth embodiment of the transmitted through the common bus 6. 

invention; and Referring to FIG. 3, the cache coherency conlroUer 4 

FIG. 22 is a view showing yet further operation example comprises thread sequence information table 12, compara- 

of the cache memory of the fourth embodiment of the 5 tors 13fl, 13b and 13c and a cache coherency maintenance 

invention protocol sequencer 14. The cache coherency controllers 4a 

to 4d have structures identical to one another. 

DETAILED DESCRIPTION OF THE The thread sequence information table 12 receives thread 

PREFERRED EMBODIMENTS sequence information (i.e., sequence of thread identifiers) 

„ . , . . in allocated from the thread management unit 1 via the thread 

A cache coherency controUer m accordance with pre- ^^ ^^^^ information transmission bus 10 and stores the 

ferred embodiments of the present mvention will be information 

described in detaH with reference to the accompanying comparator 13a refers to the index of the thread 

drawings. sequence information stored in the thread sequence infor- 

First, the principle of the invention will be described by mation table 12 so as to search information coincident with 

referring to FIG. 1. the thread identifier held by the thread identifier register 5 

Referring to FIG. 1, according to the invention, an opera- and outputs its sequence location if coincidental infonnation 

tion is guaranteed by a cache memory such that a value exists. The comparator 13b refers to the index of thread 

before the execution of a writing operation by a thread 2 can sequence information stored in the thread sequence infor- 

be read for the reading operation of a thread 1 as a preceding mation table 12 so as to search information coincident with 

thread (e.g., reading of an address 100) even when the thread ^ ™^^sler thread identifier on the control signal bus 9 and 

2 following the thread 1 executes a writing operation (e.g., outputs its sequence location if coincidental information 

writing in the address 100) in a preceding manner (in a ^J^ists. The comparator 13c compares the output of the 

physical time sequence). comparator 13a with that of the comparator 13^^ and trans- 

Referring to FIG. 2, the multi-processor system of the first .5 ""l^ infonnation regarding a preceding thread to the cache 

embodiment of the invention includes four processors #0 to coherency mamlenance protocol sequencer 14. Hie cache 

#3 (2a to 2d), cache memories #0 to #3 (3a to 3^ respec- coherency maintenance protocol sequencer 14 controls pro- 

tively corresponding to the processors, cache coherency ^^^^^ between a local cache memory and another cache 

) controUers (4a to 4d) and thread identifier registers #0 to #3 "^^"^^^ ^5 J^^ .T^ informaUon 

/ W/JL8 / (Sa to Sd), The processors 2a to 2d are com^ected to a thread 30 t^^^initted from the comparator 13c. 

^yfy<Vi> 1 management unit 1. The four processors are shown in FIG. R efenrms to FIO ^ , ^ ra rh r l in r 18 as rm E_entryjirLthe 

Ajtjri I 2 only for the purpose of illustrating the invention. It should c ache memory 3 comprises a status bit 15, an address tag 1 6 

' thus be understood that the number of processors is not an d a data arra y 17. IHe cacbe memory 3 is composedof a 

aited to four. plu rality of cache lines 18. 

Each of the processors 2a to 2d is connected to any of the 35 ^^^^^ ^"^^^^ ^^^^^ ^^^^^ ^'^^ 

corresponding cache memories 3a to 3d The cache memo- ^^^^^^^ s*^tus of the following four, i.e., "I" anvalicO, 

ries 3a to 3d are respectively connected to a common bus 6 "J^" (^^i^ty) and "DSM" (Dirty Self-Modified), 

via the cache coherency controllers 4a to 4d ^^^^^ "I" ^^^^^ ^^^^ c^'^he line is 

rn, x. c • jiu*? jjL invalid. The status "C" means that the content of the cache 

The common bus 6 comprises a data bus 7, an address bus . -^u . * j - t^. 

o J lu fi'm. ,1- iu n j,n liQC comcides with that stored in the main memory. The 

8 and a control signal bus 9. The control signal bus 9 40 ..... . . r.u u i- j 

. 1 . J. . f r status D means that the content of the cache line does not 

transmits signals indicatmg the meamngs, and so on, of • -j -.u .u . . j • *u • u r 

1 .uj.u ^ u o coincide with that stored in the mam memory because of 

signals on the data bus 7 and the address bus 8. j ^ . * «f-.oxx» . . x r 

. . „ . . . , modification. The status DSM means that the content of 

The cache coherency controllers 4a to 4d are also con- jjie cache line does not coincide with that stored in the main 

nected to a thread sequence informaUon transmission bus 10 ^ ^^^^^ ^^^^ ^^^-^^^ ^ l^^^l 

and the thread identifier registers 5^ to Sd provided corre- ^5 ^^^^^^^ ^^^^ ^^^^^ ^^^^^^ ^^^^^ coherency 

spondmg to the controllers. controller 4 maintains coherency while keeping a sequential 

The thread management unit 1 manages the generation M"| order among a plurality of caches, 

f and the eliminating of threads. T he circuit 1 allocates thr ead hlexv^be-opetatioo^he cache memory 3 of the first 

^ .P^^ J identifiers (ID) to threads in ordeT ^f generation and notifie s • e mbodiment willb ed^iSEia: ^ 

"WT^ C the TEread identifier register 5 ot these mread identmers . lUe FIGS. 5 and 6 are flowcharts each showing algorithm for 

^ T \ y processor, the tnreaa iaemmer register and the cache coher. i^aintaining cache coherency by the cache coherency main- 

^rV^ ^ (jncy controUer wiU be denoted by numerical references 2, 3 j^^^^^^ ^^^^j sequencer 14 in the cache coherency con- 

^and 4 respectively, heremafter, unless specified otherwise. ^^^^^ 4 embodiment. Specifically, FIG. 5 shows 

^ The thread identifier register 5 holds the thread identifiers operation algorithm used when a reading cache miss-hit 

notified by the thread management unit 1 until the end of the occurs and FIG. 6 shows operation algorithm used when 

^'"^^d^- ^ another processor starts a writing operation. When a read hit 

The thread sequence information transmission bus 10 occurs, the conesponding cache memory 3 supplies data, 

transmits information (thread sequence information) regard- When a write miss occurs, a read miss operation and a write 

ing a thread sequence from the thread management unit 1 to hit operation are continued, 

the cache coherency controllers 4a to 4d Referring first to FIG. 5, when read access is to be made 

Hus thread sequence information reflects a value held by by the processor 2a, if data for the access does not exist in 

the thread identifier register 5. A main memory 11 is shared the cache memory 3 (miss-hit) (step 501), the cache coher- 

by the processors 2a to 2d and these elements are connected cncy controller 4a secures the right of using the common bus 

together via the bus 6. 65 6- 

The cache coherency controller 4 performs controlling so Then, the cache coherency controller 4a searches whether A 

as to maintain coherency among the cache memories 3 based the entry of a relevant address is held or not in any of the 
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cache lines 18 of the cache memories 3/), 3c and 3d which 
belong to the other processors 2^, 2c and 2d. For this 
purpose, the cache coherency controller 4a outputs a 
required address to the address bus 8 and then outputs 
information indicating that the variety of accessing is "a data 5 
request because of a read miss" and a thread identifier to the 
control signal bus 9 (step 502). 

The other cache coherency controllers 4fc, 4c and 4d 
which have not obtained the right of using the common bus 
6 obtain the variety of accesses (read or write) and an 
address by monitoring the common bus 6. Each of the cache 
coherency controllers 4b, 4c and 4d refers to the index of 
information so as to determine whether an address identical 
to the address obtained by monitoring exists or not in the 
cache Une 18 of each of the corresponding cache memories 
3fc, 3c and 3d. 

If the resuh of determination by referring to the index 
shows that the line of the identical address exists in the 
plurality of cache memories ("No" in step 505), the cache 
coherency controller 4a outputs data held in any of the cache 
memories to the data bus if the line is only in status "C 

If the resuU of determination shows that the line of status 
"D" or status "DSM'' exists ("Yes" in step 505), the cache 
coherency controller 4a outputs corresponding data to the 
data bus 7 (step 508) if the thread identifier of the status "D" ^ 
or the status "DSM" precedes a requested thread identifier 
("Yes" in step 506), In this case, if a plurality of lines exist, 
arbitration is performed for the common bus 6 so as to 
supply data from the cache coherency controller 4 corre- 
sponding to the processor for executing a preceding thread 
which is closest in order to the requested thread identifier. If 
the thread identifier of the line of the status "D" or the status 
"DSM" succeeds the requested thread identifier ("No" in 
step 506), the cache coherency controller 4a outputs data 
from the line of the status "C* or the main memory (step 
507). 

If the result of determination made by referring to the 
index shows that no identical address lines exit in the cache 
line 18 corresponding to any of the cache coherency con- 
trollers 46, 4c and 4d, data is outputted from the main 
memory 11 to the data bus 7 based on a request from the 
cache coherency controller 4a (step 504). 

The cache coherency controller 4a which has requested 
data fetches desired data in the cache line 18 by receiving the 
broadcast data. 4* 

At this time, if the data is supphcd from any of the cache 
memories 3b, 3c and 3d and its line is in status "D" or status 
"DSM", the status bit 15 of the cache memory 3a which has 
made the request is set to status "D" (step 508) or to status 
"C" in other cases (step 509). 50 

In the cache memory of the conventional multi-processor 
system, a destination for supplying data is another optional 
cache or the main memory when a read miss occurs. 
However, according to the first embodiment, a sequential 
order is maintained among threads by imposing restrictions 55 
on a data supplying method based on the sequential rela- 
tionship by thread identifiers so as to prevent data written by 
a succeeding thread from being supplied to a preceding 
thread. 

Referring now to FIG. 6, when writing is to be performed 60 
from the processor 2a in the cache memory 3a (step 601), 
the cache coherency controller 4a secures the right of 
accessing to the common bus 6. The controller 4a outputs a 
write address to the address bus 8, and outputs information 
indicating "writing'* and a thread identifier to the control 65 
signal bus 9. At the same time, the controller 4a outputs 
write data to the data bus 7. 
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After a signal for writing has been outputted to the 
common bus 6, the cache coherency controllers 4by 4c and 
4d which do not have the right of accessing refer to the 
respective indexes of information in the cache memories. If 
the cache line 18 holds the data of address identical to the 
address outputted to the address bus 8 f Tcs" in step 602) 
and the thread identifier of the processor 2 to which the 
cache memory 3 belongs succeeds a thread identifier on the 
control signal bus 9 ("Yes" in step 604), since the operation 
is writing performed by a preceding thread, the cache 
coherency controller 4 reflects the content of the writing in 
the cache line 18. Further, if the status bit 15 of the cache line 
18 is status "C*, the status is changed to status "D" (step 
606). 

The status bit 15 of the cache line 18 of the cache memory 
3 corresponding to the processor 2a which has performed 
the writing operation is set to status "DSM" by the cache 
coherency controller 4a. 

In this way, the writing of the preceding thread is auto- 
matically reflected in the cache memory 3 of the succeeding 
thread. However, since the writing of the succeeding thread 
is not reflected in the cache memory 3 of the preceding 
thread (step 605), a time sequence relationship between the 
threads is maintained. 

If a preceding thread exists, the cache line 18 of a 
succeeding thread which is in status "DSM" is prohibited 
from writing -back to the main memory 11. This situation 
occurs because if an identical address is requested by the 
preceding thread, a value before writing performed by the 
succeeding thread is supplied from the main memory 11. If 
it is necessary to write back the cache line 18 which is not 
coincident with the main memory of the succeeding thread 
in order to store the data of another address in the cache 
memory 3, the execution of the succeeding thread is inter- 
mpted and after the end of the execution of the preceding 
thread, the cache line is written back. After the end of the 
execution of the preceding thread, the content of the data 
array 17 of the cache line 18 which is in status "DSM" is 
written back to the main memory 11. 

On the other hand, for the cache line 18 of status "D", 
since the cache line 18 of the identical address which is in 
status "DSM" exists in the cache memory of another pro- 
cessor 2, it is unnecessary to write back. If the execution of 
all the preceding threads is finished while the status "D" is 
maintained, the status is changed to status "C. 

According to the first embodiment, by performing the 
above-described cache controlling, the sequential order can 
be maintained among the threads, and data anti-dependence 
can be secured by the cache memory. However, if writing is 
started by the preceding thread for an address identical to the 
address already written or read by the succeeding thread 
thereafter, the sequential order between the threads cannot 
be maintained as it is. In this case, synchronism must be 
acquired between the threads by using software. 

Next, the multi-processor system of the second embodi- 
ment of the present invention will be described. 

Referring first to FIG. 7, instead of the common bus 6 of 
the first embodiment described above with reference to FIG. 
2, the multi-processor system of the second embodiment 
uses a network 23 for connecting a cache memory 21 and a 
main memory 28 each other. Accordingly, in the second 
embodiment, a network interface 22 is newly added. A cache 
coherency controller 26 is shared by processors. The cache 
coherency controller 26 is directly connected to a main 
memory 28. A directory table 27 is connected to the cache 
coherency controller 26. 
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FIG. 8 is a block diagram showing the main memory 28 writing of the succeeding thread is not reflected in the cache 

and the directory table 27 of the second embodiment. The memory 21 of the preceding thread, a time sequence rela- 

directory table holds a status bit 29 corresponding to each tionship is maintained among the threads, 
processor 20. The status bit 29 corresponds to the memory Next, the multi-processor system of the third embodiment 
line 32 of the main memory 28. S of the present invention will be described. 

ReferringtoFIG. 8, the status bit 29 of the directory table Referring to FIG. 11 which illustrates the third 

27 includes a valid bit indicating the existence of a copy in embodiment, a cache memory 35 is shared by processors 4tO 

a cache memory 21 accompanying the processor 20, and a to #3 (34a to 34d). In the third embodiment, the cache 

dirty bit 31 indicating that the processor 20 has modified the memory 35 is constructed in such a manner that data relating 

copy- to a particular address can be stored in a plurality of cache 

The set condition of the dirty bit 31 is called status "D". lines 43 (multi-way cache structure). 
Thread sequence information is transmitted from a thread Referring to FIG. 12 which also illustrates the third 

management unit 19 to the cache coherency controller 26 via embodiment, one entry of the cache memory holds a thread 

a thread information transmission bxis 25 for each change of identifier tag 39 in each line. The status bit of the cache takes 

the execution status of a thread. any one of the following three, i.e., "I" (Invalid), "C* 

For an access request made by each processor to the main (Clean) and "D" (Dirty). The status "I" means that the 

memory 28, the cache coherency controller 26 maintains content of the cache line is invalid. The status "C* means 

coherency between caches by algorithm shown in FIG. 9 or that the content of the cache line coincides with that of the 

FIG. 10. main memory. The status *'D" means that the content of the 

It is now assumed that a processor 20a has started an ^^^^ ^^^^ not coincide with that of the main memory, 

operation for the memory. First, an operation for reading will Referring to FIG. 13 which illustrates a cache coherency 

be described. controller of the third embodiment, in the cache coherency 

If the data of an address requested by the processor 20a controller shown and denoted by a numerical reference 36, 
does not exist (cache miss-hit) (step 901). the cache memory is ^^^^ ^^^'^ ^^^^^ memory 35, a request arbiter 45 

21 issues a data reading request to the cache coherency arbitrates requests from the processors #0 to #3 (34a to 34</). 
controller 26 via the network 23. A status comparator 49, an address comparator 50 and a 

The cache coherency controller 26 searches the directory protocol sequencer 51 determine a cache hit/miss. For 

table 27 (step 902). If the controUer 26 finds the entry of readmg, data supplied from each hne 43 of the cache is 
status "D" ("Yes" in step 903) and the entry of the status "D" 30 selected. For wnting, a line 43 m which wntmg is to be 

is for a preceding thread ("Yes" in step 905), data transfer is perfomied is decided. 

requested to the cache memory 21 having the entry of the A write buffer 47 is provided for performing writing in a 

status "D". After having received the request, the cache part of a data array 42. The write buffer 47 reads the value 

memory 21 transfers requested line data to the cache of the data array 42 beforehand, corrects a necessary portion 
memory 21a of the processor 20 which has issued the 35 and writes the corrected data in the data array 42 again, 
request (step 907). Next, the operation of the c ache memory of the third ' 

In cases other than the above, desired data is transferred e mbodiment will be described^ 
to the cache memory 21fl of the processor 20a which has In the third embodiment, the status of the cache hit takes 

issued the request (steps 904 and 906). the following three, i.e., self ID hit, preceding ID hit and 

In any cases, the valid bit 30 of the status bit 29a of the succeeding ID hit. The status self ID hit means that a cache ^ * ^^^^ 

directory table 27 corresponding to the processor 20a is set. line coincident with a requested ID exists. The status pre- X> ' 

Next, an operation for wriUng will be described. For ceding ID hit means that the cache line of a thread ID C\ 

writing, irrespective of a cache hit or a cache miss-hit, a data preceding the requested ID exists. TTie status of succeeding 
writing request is issued to the cache coherency controller ^^^^^ ^^t the cache line of a thread succeeding the ^ 

26 via the network 23. requested ID exists. <'xM^^ cK^" 

When a cache hit occurs, an operation described below ^f^^ ^ri access request is made from the processor 34 to ^ 
will be performed thereafter. When a cache miss occurs, an cache 35, a cache entry 44 is selected based on a part of ^ fj^^ 

operation described below will be performed after a read ^ ^^I^iest address signal 55. /"i^^^^^- 
operation performed at the time of the occurrence of the 50 From the selected cache entry, all the cache lines 43 (i.e., L ' 

cache miss. from way 0 to way 3) are outputted from the thread identifier ^ 

The cache coherency controller 26 searches the directory ^9, the status bit 40, the address tag 41 and the data array 

table 27 and transmits write data to the entry of a succeeding 

thread among the entries in which the valid bit 30 is set (step The address comparator 50 compares each address out- 
1006). The controller 26 also sets the dirty bit 31 of the 55 Pitted from the address tag 41 with the request address 

directory table 27 corresponding to the processor 20 which signal 55 and sends out information regarding coincidence/ 

has made the request. non-coincidence for each cache hoe 43 to the protocol 

According to the second embodiment, as in the case of the sequencer 51. 
first embodiment, if a preceding thread exists, for the entry The status comparator 49 sends out information used for 
of the cache memory 21 of a succeeding thread which is in 60 determining which status the thread identifier tag of the 

status "D", the write-back to the main memory 28 is cache hne 43 takes, self ID, preceding ID or succeeding ID, 

prohibited. By controlling the reflection of each of writing from the thread identifier of the processor 34 which has 

results for threads other than a first preceding thread in the issued the request and a thread sequence information signal 

main memory 28 and maintaining the reading request cor- 53 to the protocol sequencer 51. 

rectness of the first preceding thread, the writing of the 65 Based on such information, the protocol sequencer 51 

preceding thread is automatically reflected in the cache supplies data, stores data, and maintains coherency by 

memory 21 of the succeeding thread. However, since the algorithm shown in FIGS. 14 and 15. 
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An operation perfonmed when the processor 34 makes a identifier of the processor 34 which has made the request 

reading request (step 1401) and the request is arbitrated and (step 1504). If IDs are identical to each other (i.e., self ID 

selected by the request arbiter 45 will be described. hit), data is writtea in the data array 42 of the cache line 43 

The address comparator 50 searches whether a requested (step 1507). If a succeeding thread exists and its status is 

address exists in the cache line 43 or not (step 1402). If the 5 status "D", similar data is written in its cache line. These 

requested address does not exist, the operation is operations arc performed by the write buffer 47. 

(unconditionally) detcnmined to be a cache miss and data is On the other hand, if no identical ID exists ("No" in step 

fetched from the main memory (step 1403). 1504), the status comparator 40 searches the existence of 

If the cache Une 43 coincident with the requested address coincident with the preceding ID. If the cache line 43 of 

exists (YES ia step 1402), the status comparator 49 com- preceding ID exists C^es" in step 1505), a copy section 

pares the thread identifier tag 39 with the executed thread between lines copies the cache line of the preceding ID 

identifier of the processor 34 which has made the request ^^^^^ of self ID and performs writing in the cache 

(step 1404). If IDs are identical to each other (i.e., self ID ^f the self ID (step 1508). No writing is performed in the 

hit), the data of the data array 42 is suppUed from the cache c^c^^ preceding ID. However, if the cache line of 

line 43 (self ID line) (step 1407). status "D" having a succeeding thread identifier exists. 

TfjT^ * M *• 1 * u *u /UKT »» • » iAnA\ similar data is also written in this cache line. If a plurality of 

If IDs are not identical to each other ( No m step 1404), , • -j * -^u *u j- ti^ • . *u 

. _ . u .1. c cache Imes coincident with the preceding ID exist, the data 

the status comparator 49 searches the existence of one u v u u- u- u • i . * *i. ir tt>. 

. .J , / , ^Aae\ ^c,^. i. t of the cachc Uac holding ID which IS closest to the sclf ID 

coincident with a preceding ID (step 1405). If the cache Ime . . , , ■„„ t^^„^^„ i- 

A-y ■ * r .u J- TT>. .u J . ^.1. J . IS supplied lor copying between hnes. 

43 exists for the preceding ID, the data of the data array 42 ^„ "Ir. , 

of the cache line 43 is suppUed (step 1408). If a plurality of '° "° Preceding ID h.t is realaed. the existence of one 

cache lines 43 coincident with the preceding ID exist, 5°!?"'^!?' '^^^'f. s"«=eeding ID is mvestigated (step 

priority is given to the data of the cache line 43 holding ID 1506). If a cache line coincident with the succeeding ID 

closest to the self ID exists and its status is status C , wnting is performed for the 

, , , ' , . , , data of the cache line via the write buffer 47. Then, the 

If the cache line does not hit the preceding ID, the status 25 ^^^ceeding ID is changed to self ID (step 1509). 

comparator 49 searches the«astence of one coincident with ,f „^ ^^^^ ^.^^.^^^^ ^.^^ succeeding ID exist, 

the succeeding ID (step 1406). If a fine coincident With the ~ . . . ^ lervix 

. ^ - \ ^ . , ^ J . c the operation of a cache miss is performed (step 1502). 

succeeding ID exists and Its status is status C , the data of , . . t ^a - ■ 

^ the data line 42 of this cache line 43 is supplied and the fj^.^" t ^^^^ 

A succeeding ID is changed to self ID (step 1409). 30 ^ f^^^ ^ * f ^'^T i^' '""'r^ processor 34 in 

JKfl . . ... T,^ which a thread identifier IS "2, if a writing request IS made 

If no cache fines 43 coincident with the succeeding ID ^^^^ ^j^^ ^^^^^^ preceding ID hit is 

I ^ exist, data is fetched from the mam memory 38 (step 1402). ^^^^^^^ ^j^^ ^^^^^^^ ^^^^^ ^^^^^^ 35 

For example, in a state where a certain cache entry 44 is this case, the data of the way 0 is copied as self ID for the 

in a state like that shown in FIG. 16, when the address tag way 1 or the way 3 and writing is performed in the copied 

41 of the cache memory 35 requests data regarding ^5 data array (i.e., way 1 or way 3). The thread identifier tag 39, 
addresses "0x100", "0x200", "0x300*^ and "0x400" in the the status bit 40 and the address tag 41 are set to "2", status 
processor 34 in which a thread identifier is "2", the data of «d" and "0x100" respectively. 

the address "0x100" is a preceding ID hit. Hien, the data is ^^en accessing is to be made to the address "0x200", 

supplied firom the data array 42 of a preceding thread because of self hit, data is written in the data array 42 via the 

identifier, i.e., the way 0 of a thread identifier The<iftta of the ^0 ^^^^ ^^^^ ^^^^ ^ ^^^^^ ^ ^ 

a ddress ^^Oy?n (y' is srlf IP hit (thrf.ad identifier tag 39 o f the ^^^^^ 

cache entry is "2"). The data is supplied from the data array ^^^^ accessing is to be made to the address "0x300", the 

42 ot a way 1. ^^^^ address "0x300" exists in the way 2. However, 
The data of the address "0x300" exists in a way 2. the thread identifier tag 39 is "3", a thread is a succeeding 

However, since the way 2 is for the thread identifier tag of ^ thread and a state is staUis "D" for the way 2, a cache miss 

"3", a succeeding thread and status "D", the data is fetched committed. 

from the memory. The data of the address "0x400" exists in the way 3 and 
The data of the address "0x400" exists in a way 3. Since the thread identifier tag 39 is "3". However, since a state is 
the way 3 is for the thread identifier 39 of "3", a succeeding status "C, succeeding ID hit is realized. Accordingly, the 
thread and status "C*, succeeding ID hit is realized. data of the cache line 43 of the way 3 is written via the write 
Accordingly, the data is supplied from the data array 42 of buffer 47. Then, the succeeding ID is changed to self ID. 
the way 3. Then, the thread identifier tag 39 of the way 3 is Assuming that the status of the cache entry is status 
changed to "2". "Before Write" like that shown in FIG. 17, after the pro- 
Next, by referring to FIG. 15, an operation when a write 55 cesser 34 of a thread identifier "2" makes a writing request 
request is made by the processor 34 (1501) and the request for the data whose address tag 41 is "0x200", preceding ID 
is arbitrated and selected by the request arbiter 45 will be hit is realized and the data of the way 1 is copied to the way 
described. 0 or the way 3. Herein, it is assumed that the data is copied 

The address comparator 50 searches the existence of a to the way 0. 

requested address in the cache line 43 (step 1502). If the go Thus, for the way 0, the thread identifier tag 39, the 

requested address does not exist, the operation uncondition- address tag 41 and the status bit 40 are set to "2", "0x200" 

ally becomes a cache miss. After data is fetched from the and status "D" respectively. The result of writing the content 

memory, write data is merged by the write buffer 47 and of the data array 42 of the way 1 for the data copied to the 

written in the cache line 43 (step 1503). way 0 is written in the data array 42 of the way 0 via the 

If a cache line 43 coincident with the requested address 65 write buffer 47. 

exists ("Yes" in step 1502), the status comparator 49 com- At the same time, the same data is written in the data array 

pares the thread identifier tag 39 with the executed thread 42 of the way 2 via the write buffer 47. By the above - 
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described operations, the status of the cache entry becomes 
status "After Write" like that shown in FIG. 17. When the 
data of the cache line 43 is to be copied or new address data 
is to be fetched from the main memory 38, the data of the 
cache line 43 existing at this time must be replaced by S 
another. In the conventional cache memory, a cache line 43 
whose data mvsi be replaced by another is decided at 
random or based on a past referencing history at this time. 

On the other hand, in the cache memory of the 
embodiment, since for a preceding thread the process needs 
to refer to data before the correction of the data by a 
succeeding thread, the write -back operation of the data array 
42 of status "D" in the main memory 38 must be prohibited 
unless a thread is a first preceding thread in the system. In 
order to prevent the occurrence of a deadlock caused by the 15 
impossibility of securing any cache lines 43 for the preced- 
ing thread, one cache line ^ must always be secured for a 
first preceding thread in one entry. Accordingly, if any cache 
lines 43 cannot be secured for a succeeding thread, the 
process must wait for the end of the execution of the 
preceding thread. 

By using a model which does not define the data depen- 
dence of a succeeding thread on a preceding thread as a 
thread execution model, even if restrictions are imposed on 
the cache line 43 to be replaced by another, the occurrence 
of a deadlock can be prevented. 

According to the third embodiment, by performing the 
above-described cache controlling, a time sequence relation- 
ship can be maintained among threads and data anti- 
dependence can be eliminated even if the cache memory 
shared by the processors 34 is used. However, if a preceding 
thread starts writing in an address identical to an address 
already written or read by a succeeding thread later on, the 
time sequence relationship among the threads cannot be 
maintained at it is. In this case, synchronism must be 
acquired among the threads by using software. 

Next, the fourth embodiment of the present invention will 
be described. 

According to the fourth embodiment, in addition to the 
elements provided in the first to third embodiment, hardware 
is provided to guarantee that even if a preceding thread 
performs writing in an address already written or read by a 
succeeding thread, the content of the writing cannot be 
reflected in the cache memory of the succeeding thread. 45 

Referring to FIG. 18, the cache line of the cache memory 
of the fourth embodiment comprises a write mask 62 in 
addition to the configurational element of the first embodi- 
ment described above with reference to FIG. 4. This write 
mask 62 is provided for each word unit or a minimum 50 
writing unit of a data array 61. The write mask 62 is cleared 
when new data from a memory or data from another cache 
is stored in a cache line 63. If writing is performed by the 
thread of the cache line 63, the write mask 62 is set for each 
written word or a minimum writing unit. 55 

As in the cases of the first to third embodiments, in the 
fourth embodiment, in order to maintain a sequential order 
among threads, writing by a preceding thread must be 
reflected in the data array 61 of a cache line 63 which 
belongs to the succeeding thread of an identical address tag 60 
60. Accordingly, controlling is performed for this purpose. 

In the first to third embodiments, however, no guarantee 
is given to ensure that the writing of a preceding thread 
performed for an address already written or read by a 
succeeding thread is not reflected in the cache memory of the 65 
succeeding thread side (i.e., writing sequence relationship 
among a plurality of threads). In order to maintain such a 



writing sequence relationship, synchronism must be 
acquired between the threads by using software as in the 
case of reading of the write data of the preceding thread by 
the succeeding thread. 

Next, as examples of the operations of the fourth 
embodiment, operations when the data array 61 of the cache 
line 63 is in states like those shown in FIGS. 19 to 22 will 
be described. 

Referring to FIG. 19, a writing operation 1 (64) is writing 
performed by the processor to which the cache line belongs. 
By this operation, "OxfffiEfffP* is written in an address 
"0x3000". 

After the above-noted operation, the data of "Oxfffl[£fff ' is 
written in the "0x30000" address position of the data array 
61 and simultaneously a corresponding write mask 62 is set 
(see FIG. 20). 

Next, as a writing operation 2 (65), an operation per- 
formed so as to maintain a time sequence relationship when 
a preceding thread writes the data of "0x01234567" in an 
address "0x30008" will be described. In this case, since no 
write mask 62 corresponding to the address " 0x30008'* is 
set, the writing operation of the preceding thread is reflected 
and the data of "0x01234567" is written in the address 
"0x30008" position of the data array 61 (see FIG. 21). 

Lastly, as a writing operation 3 (66), an operation per- 
formed so as to maintain a time sequence relationship when 
a preceding thread writes the data of "0x00000000" in the 
address "0x30000" will be described. In this case, since a 
write mask 62 corresponding to the address "0x30000** is 
set, the writing operation of the preceding thread is reflected 
but not in the address "0x30000" position of the data array 
61 (see FIG. 22). Such a situation occurs, because according 
to the rule of a time sequence relationship among threads, 
the writing operation 1 (64) is prescribed as an event 
occurring later than the writing operation 3 (66). 

According to the fourth embodiment, processing for pre- 
venting the reflection of writing firom the preceding thread in 
the word or the minimum writing unit in which the write 
mask 62 has been set is performed by the cache memory of 
the succeeding thread side and thereby the time sequence 
relationship of writing in an identical address can be main- 
tained among a plurality of threads by controlling of the 
cache memory. 

Next, the fifth embodiment of the present invention will 
be described. In the fifth embodiment, there are a plurality 
of definitions for the operation mode of a multi-processor 
system and the mode can be changed from one definition to 
the other. The configuration of the cache of the fifth embodi- 
ment can be applied fi^r all the cache memories of the first 
to fourth embodiments. 

In the cache memory of each of the first to fourth 
embodiments, the cache coherency controller performs con- 
trolling for propagating writing only to the cache of a 
succeeding thread based on thread sequence information 
received from a thread management unit. However, in the 
fifth embodiment, by means of extension for switching such 
writing propagation control to be performed for all the cache 
memories holding identical addresses, a model for guaran- 
teeing all sequential relations among threads and executing 
a plurality of threads is executed. In other words, by instruct- 
ing in which mode the cache coherency controller of each 
embodiment should be operated beforehand, an operation 
mode is switched. 

Now, the extension method of each embodiment will be 
described in detail. 

First, the multi-processor system comprising the distrib- 
uted type cache coherency controller of the first embodiment 
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which is provided for each of the processors connected to 
one another via buses and configured by bus connection will 
be described. 

In this case, a signal for non-specification of the sequence 
of threads is transmitted from the thread management unit 1 
shown in FIG, 2 through the thread sequence information 
transmission bus 10 to the cache coherency controller 4. 
After receiving such information, the cache coherency con- 
troller 4 performs no controlling based on the determination 
of "a preceding thread or a succeeding thread". In other 
words, when a reading miss occurs, no determination is 
made as to "whether a thread is a preceding thread or not" 
like that shown in the flowchart of FIG. 5 (step 506 of FIG. 
5) and data is always supplied from the cache of "DVDSM" 
status. 

A lso, when writing by ano ther thread occurs, no deter- 
inatio n is made as to "wiie tner a tnreaa ts precedin g or 

Aiircee^ing"' like t hat shnwrT in th** tl"wrhart (^f FTTw^ and 

writing by another thread is reflected in the cache of a self 
Nthread. Switching of these modes is per formed on ly by 
^witcnmg eacb aecision logic of the cache coherency co n- 
/t rnller 4 between valid an dft nTOTd^Accordingly^ the exten- 
[ si on of ha rdware can heljipit ed to a minimum size. 

^ext, extension for the second embodiment wfll be 
described. In the second embodiment (see FIG. 7), only one 
cache coherency controller 26 exists in the system, which is 
shared by the configuring elements of the system. As in the 
case of the bus connection method, if no time sequence 
relationship exists among threads, a signal for non- 
specificatioQ of the sequence of the threads is transmitted 
from the thread management unit 19 through the thread 
sequence information transmission bus 25 to the cache 
coherency controUer 26. In this case, no determination is 
made as to whether "a thread is preceding or succeeding" 
like that shown in the flowchart of FIGS. 9 and 10. If the 
cache memory 21 of status "D" exists when reading is to be 
performed, a request is made for data transfer from the cache 
memory 21. For writing, write data is transmitted to a shared 
line. As in the case of bus connection, switching of these 
modes is performed only by switching each decision logic of 
the cache coherency controller 4 between valid and invalid. 
Accordingly, the extension of hardware can be limited to a 
minimum size. 

In the third embodiment, when parallel processing is to be 
performed by a thread model having no time sequence 
relationship, caches are all treated as being shared. In other 
words, the processing step of comparing the value of the 
thread identifier tag 39 with another for the cache entry 44 
shown in FIG. 13 is removed from the flowchart of FIGS. 14 
and 15. That is, only by means of comparison between the 
requested address and the address arbiter 41, a cache hit or 
a miss-hit is determined. 

Therefore, in the fifth embodiment, it is only necessary to 
switch each decision logic of the cache coherency controller 
36 between valid and invalid and thus the extension of 
hardware can be hmited to a minimum size. In addition, in 
such an operation, there is no possibility that a plurality of 
cache lines hold data of identical addresses. 

Next, the sixth embodiment of the present invention will 
be described. 

In the sixth embodiment, the operations of the fifth 
embodiment are switched between each other based on a 
requested address. Changing of cache algorithm based on a 
reference address is performed by giving an attribute to each 
entry of a TBL (Translation Lookaside Buffer). Such a 
switching technology has been available in the prior art. The 



sixth embodiment can be realized by combining such a 
technology with the fifth embodiment. 

As apparent from the foregoing, the present invention is 
effective in that by extending the cache coherency mainle- 
5 nance function of a cache memory widely used for a 
microprocessor, in a thread execution model for simulta- 
neously executing a plurality of threads having a consecu- 
tive time sequence relationship (sequential order) in the 
same memory space, data anti-dependence on the memory 
^0 can be hidden from software and thus more efficient parallel 
processing can be performed. 

In order to obtain the same effect by extending the 
conventional system, exclusively used hardware is needed 
and hardware costs are increased. According to the present 
invention, however, this problem is solved by using the 
cache memory. 

The present invention is also effective in that the necessity 
of comparing aU the bits indicating addresses with one 
another can be eliminated by using the cache memory. 
Moreover, according to the present invention, coexistence 
with a thread execution mode having no time sequence 
relationship can be facilitated. For data dependence, the 
present invention makes no mention of solving a problem by 
hardware. However, data pro -dependence is a problem 
intrinsic to parallel processing, which is to be solved by a 
synchronizing mechanism. Accordingly, a hardware con- 
figuration can be simplified. 

While the invention has been described in terms of several 
preferred embodiments, those skilled in the art wiU recog- 
nize that the invention can be practiced with modification 
within the spirit and scope of the appended claims. 
What is claimed is: 

1. A cache coherency controller in a multi-processor 
system, said multi-processor system comprising: 

a plurality of processors; 

a cache memory for each of said plurality of processors; 
a thread sequence information table for holding orders of 

threads executed by said multi-processor system; 
a comparator for determining a sequential relationship 
among threads based on said thread sequence informa- 
tion table; and 
a cache coherency maintenance protocol sequencer for 
maintaining coherency by supply first cache Une 
including data produced by a preceding thread to a 
second cache line including data produced by a suc- 
ceeding thread foUowing said preceding thread based 
on a result of determination performed by said com- 
parator and preventing said second cache Une including 
data produced by said succeeding thread from being 
supplied to said first cache line including data produced 
by said preceding thread, 

said cache coherency maintenance protocol sequencer, 
when a read miss occurs in corresponding cache 
memory, loads data modified by a preceding thread 
in a corresponding cache memory if said data is 
included in another cache memory, and 
said cache coherency maintenance protocol sequencer, 
when a writing operation occurs in a cache memory 
other than in corresponding cache memory, loads 
write data in said corresponding cache memory if 
said writing operation occurred by a preceding 
thread. 

2. A cache coherency controller in a multi-processor 
system, said mutlti-processor system comprising: 

a plurality of processors; 
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thread management unit for allocating thread identifiers in 
generation order of threads executed on s aid plurality 
of processors, said thread management unit including a 
thread sequence information table holding orders of 
threads executed by said multi-processor system; 5 

a comparator for determining a sequential relationship 
among threads based on said thread sequence informa- 
tion table; 

a cache memory provided corresponding to each of said 
plurality of processors, said cache coherency controller 
for controlling cache coherency maintenance of said 
cache memory; 

a main memory connected to said cache coherency con- 
troller and divided into a plurality of memory lines; and ^ ^ 

a cache coherency maintenance protocol sequencer for 
maintaining coherency by supplying a first cache line 
including data produced by a preceding thread to a 
second cache line including data produced by a suc- 
ceeding thread following said preceding thread based 20 
on a result of determination performed by said com- 
parator and preventing said second cache line including 
data produced by said succeeding thread from being 
supplied to said first cache line including data produced 
by said preceding thread, said cache coherency main- 25 
tenance protocol sequencer including a directory table 
for holding status of each of said cache memories 
corresponding to said memory lines of said main 
memory. 

3. The cache coherency controller according to claim 2, 30 
wherein: 

said cache coherency maintenance protocol sequencer, 
when a read miss occurs in corresponding cache 
memory, loads data modified by a preceding thread in 
a corresponding cache memory if said data is included 35 
in another cache memory; and 

said cache coherency maintenance protocol sequencer, 
when a writing operation occurs in a cache memory 
other than in corresponding cache memory, loads write 
data in said corresponding cache memory if said writ- 
ing operation occurred by a preceding thread. 

4. The cache coherency controller according to claim 3, 
said directory table comprising: 

a valid bit for indicating existence of a memory line copy 

in said corresponding cache memory; and 
a dirty bit for indicating existence of a modified memory 

line copy in said corresponding cache memory. 

5. A cache coherency controller in a multi-processor 
system, said multi-processor system comprising: 

a plurality of processors; 

a thread management unit for allocating thread identifiers 
in generation order of threads executed on said plurality 
of processors, said thread management unit including a 
thread sequence information table for holding orders of ss 
threads executed by said multi-processor system; 

a comparator for determining a sequential relationship 
among threads based on said thread sequence informa- 
tion table; 

a cache memory shared by said plurality of processors, 
said cache coherency controller for controlling cache 
coherency maintenance of said cache memory; 
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a main memory connected to said cache memory; and 
a cache coherency maintenance protocol sequencer for 
maintaining coherency by supplying a first cache line 
including data produced by a preceding thread to a 
second cache line including data produced by a suc- 
ceeding thread following said preceding thread based 
on a result of determination performed by said com- 
parator and preventing said second cache line including 
data produced by said succeeding thread firom being 
supplied to said first cache line including data produced 
by said preceding thread. 

6. The cache coherency controller according to claim 5, 
wherein said cache memory holds a thread identifier and a 
status bit for each cache line and allocates a cache line for 
each thread. 

7. A cache coherency controller in a multi-processor 
system having a plurality of processors, comprising: 

a thread sequence information table for holding orders of 
threads executed by said multi-processor system; 

a comparator for determining a sequential relationship 
among threads based on said thread sequence informa- 
tion table; and 

a cache coherency maintenance protocol sequencer for 
maintaining coherency by supplying a first cache line 
including data produced by a preceding thread to a 
second cache line including data produced by a suc- 
ceeding thread following said preceding thread based 
on a result of determination performed by said com- 
parator and preventing said second cache line including 
data produced by said succeeding thread firom being 
supplied to said first cache line including data produced 
by said preceding thread; 

wherein a cache line comprises a write mask, said write 
mask is set when writing is performed by a thread to 
which a cache line belongs, and writing in a cache 
line is prohibited if said write mask is set when said 
writing is performed in said cache line of a succeed- 
ing thread by a preceding thread. 

8. A cache coherency controller in a multi-processor 
system having a plurality of processors, comprising: 

a thread sequence information table for holding orders of 
threads executed by said multi-processor system; 

a comparator for determining a sequential relationship 
among threads based on said thread sequence informa- 
tion table; and 

a cache coherency maintenance protocol sequencer for 
maintaining coherency by supplying a first cache line 
including data produced by a preceding thread to a 
second cache line including data produced by a suc- 
ceeding thread following said preceding thread based 
on a result of determination performed by said com- 
parator and preventing said second cache line including 
data produced by said succeeding thread from being 
supplied to said first cache line including data produced 
by said preceding thread, said cache coherency main- 
tenance protocol sequencer further comprising a mode 
for propagating writing to all cache memories holding 
identical addresses. 

« >t< * * « 
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